On Cluster Validity and the Information Need of Users
نویسندگان
چکیده
In the field of information retrieval, clustering algorithms are used to analyze large collections of documents with the objective to form groups of similar documents. Clustering a document collection is an ambiguous task: A clustering, i. e. a set of document groups, depends on the chosen clustering algorithm as well as on the algorithm’s parameter settings. To find the best among several clusterings, it is common practice to evaluate their internal structures with a cluster validity measure. A clustering is considered to be useful to a user if particular structural properties are well developed. Nevertheless, the presence of certain structural properties may not guarantee usefulness from an information retrieval standpoint, say, whether or not the found document groups resemble the classification of a human editor. The paper in hand investigates this point: Based on already classified document collections we generate clusterings and compare the predicted quality to their real quality. Our analysis includes the classical cluster validity measures from Dunn and Davies-Bouldin as well as the new graph-based measures Λ (weighted edge connectivity) and ρ (expected edge density). The experiments show interesting results: The classical measures behave in a consistent manner insofar as mediocre and poor clusterings are identified as such. On real-world document clustering data, however, they are definitely outperformed by the expected edge density ρ. This superiority of the graph-based measures can be explained by their independence of cluster forms and distances.
منابع مشابه
Identification of parameters affecting the success of the hospital information system & presentation of a model for user satisfaction improvement
Complex institutions comprising several divisions and departments such as hospitals need access to information. Hospital information system has many capabilities and in case this system is acceptance by hospital staff, it leads to a revolution in the health care delivery industry. The identification of effective determinants and measures on the success of hospital information systems could sign...
متن کاملPrediction of user's trustworthiness in web-based social networks via text mining
In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...
متن کاملDesign, Implementation and Evaluation of Software to Increase Users’ Awareness and Facilitate the Identification of the Most Appropriate Centers Providing Laboratory Services in Tehran Province
Background and Aim: Medical diagnostic laboratories are among the most important centers in the treatment cycle of patients. Today, the conscious choice of such laboratories is one of the challenges that patients face in the treatment process. This study was conducted with the aim of improving the knowledge of software users in the field of laboratory sciences and also facilitating the consciou...
متن کاملAn Investigation on the User Behavior in Social Commerce Platforms: A Text Analytics Approach
Nowadays, the tourism industry accounts for approximately 10% of the global GDP, while it only contributes 3% of the economy in Iran. Since the pressure of US sanctions increases day after day on the Iranian economy, the necessity of paying attention to this industry as a source of foreign currency is felt more than ever. The purpose of this research is to analyze the reviews of users of social...
متن کاملThe Impact of Users’ Perception of Social Responsibility on the Usage of Public Library Services and Resources with the Mediated Role of Perceived Organizational Image
Purpose: Every organization’s commitment to its social responsibilities is a way to form a positive mental image among its audience that can affect their attitude towards using of the organization’s services. In the present era, despite the various media with diverse capabilities in the field of providing and disseminating of information, the field of information is experiencing intense competi...
متن کاملتأثیر حریم خصوصی، امنیت و اعتماد ادراک شده بر رفتار به اشتراکگذاری اطلاعات در شبکههای اجتماعی موبایل: نقش تعدیلکننده متغیر جنسیت
The appearance of social networks has been one of the most important events in recent decades. One of the issues raised in these networks, is how to trust. The purpose of this paper is to examine the impact of security, trust and privacy about information sharing on mobile social networks. The study also describes how users' gender moderates the privacy and security impact on trust. The current...
متن کامل